Skip to content

[fix](paimon-cpp) deduplicate Arrow linking to fix SIGSEGV in FilterRowGroupsByPredicate#60883

Open
xylaaaaa wants to merge 2 commits intoapache:masterfrom
xylaaaaa:fix/paimoncpp-dedup-arrow-linking
Open

[fix](paimon-cpp) deduplicate Arrow linking to fix SIGSEGV in FilterRowGroupsByPredicate#60883
xylaaaaa wants to merge 2 commits intoapache:masterfrom
xylaaaaa:fix/paimoncpp-dedup-arrow-linking

Conversation

@xylaaaaa
Copy link
Contributor

Proposed changes

Problem

When ENABLE_PAIMON_CPP is ON, both Doris's own libarrow.a and paimon-cpp's libarrow.a are linked into doris_be, causing 3698 duplicate global symbols. This leads to SIGSEGV crashes in paimon::parquet::ParquetFileBatchReader::FilterRowGroupsByPredicate when libarrow_dataset.a resolves arrow core calls to the wrong copy (compiled with different feature flags).

Both are Arrow 17.0.0 but compiled with different options:

Feature Doris Arrow paimon Arrow
COMPUTE OFF ON
DATASET OFF ON
ACERO OFF ON
FILESYSTEM OFF ON
FLIGHT ON OFF
FLIGHT_SQL ON OFF
PARQUET ON ON

Crash Stack

SIGSEGV invalid permissions for mapped object
 → std::string::basic_string(char const*, ...)
 → paimon::ToPaimonStatus(arrow::Status const&)
 → paimon::parquet::ParquetFileBatchReader::FilterRowGroupsByPredicate(...)

Root Cause

Inside -Wl,--start-group ... --end-group, the linker may resolve symbols from libarrow_dataset.a (paimon's) to Doris's libarrow.a, which was compiled without COMPUTE/FILESYSTEM modules. The internal object memory layout differs, causing arrow::Status and other objects to trigger illegal memory access when passed across library boundaries.

Fix

When the paimon_deps Arrow stack is selected (because Doris lacks libarrow_dataset.a / libarrow_acero.a), remove Doris's arrow from COMMON_THIRDPARTY.

paimon's libarrow.a is a superset of Doris's version (same 17.0.0, with additional modules enabled), so it provides all symbols needed by Doris's libarrow_flight.a / libarrow_flight_sql.a.

Impact

  • Only be/CMakeLists.txt changed (~10 lines).
  • No C++/Java business code changes.
  • No impact when ENABLE_PAIMON_CPP=OFF.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

@xylaaaaa xylaaaaa requested a review from zclllyybb as a code owner February 27, 2026 09:05
Copilot AI review requested due to automatic review settings February 27, 2026 09:05
@Thearas
Copy link
Contributor

Thearas commented Feb 27, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes SIGSEGV crashes in paimon-cpp's ParquetFileBatchReader when ENABLE_PAIMON_CPP is ON. The crash was caused by linking both Doris's libarrow.a and paimon-cpp's libarrow.a into the binary, creating 3698 duplicate global symbols. Although both are Arrow 17.0.0, they were compiled with different feature flags (Doris: FLIGHT enabled, paimon: COMPUTE/DATASET/ACERO/FILESYSTEM enabled), causing memory layout incompatibilities that led to crashes when arrow_dataset resolved symbols to the wrong copy.

Changes:

  • Implement stack-based Arrow library selection logic that chooses either the complete Doris or paimon_deps Arrow stack
  • When paimon_deps stack is selected, remove Doris's arrow from COMMON_THIRDPARTY to eliminate duplicate symbols
  • Add status messages to indicate which Arrow stack is being used
Comments suppressed due to low confidence (1)

be/CMakeLists.txt:634

  • The comment states "mixing different Arrow versions" but both Doris and paimon use Arrow 17.0.0 according to the PR description. Consider updating the comment to clarify that the issue is mixing the same Arrow version compiled with different feature flags, not different versions.
    # mixing different Arrow versions (e.g. Doris core + paimon dataset/acero),

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +678 to +689
if (_doris_arrow_core AND _doris_arrow_dataset AND _doris_arrow_acero)
set(_selected_arrow_stack "doris")
set(_selected_arrow_core "${_doris_arrow_core}")
set(_selected_arrow_dataset "${_doris_arrow_dataset}")
set(_selected_arrow_acero "${_doris_arrow_acero}")
set(_selected_arrow_filesystem "${_doris_arrow_filesystem}")
elseif (_paimon_arrow_core AND _paimon_arrow_dataset AND _paimon_arrow_acero)
set(_selected_arrow_stack "paimon_deps")
set(_selected_arrow_core "${_paimon_arrow_core}")
set(_selected_arrow_dataset "${_paimon_arrow_dataset}")
set(_selected_arrow_acero "${_paimon_arrow_acero}")
set(_selected_arrow_filesystem "${_paimon_arrow_filesystem}")
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The selection logic prioritizes Doris's Arrow stack over paimon_deps when both are complete. If Doris's build configuration changes in the future to include DATASET and ACERO modules, this could cause the same duplicate symbol issues this PR is fixing, because paimon's code would be linked against Doris's Arrow instead of paimon_deps's Arrow. Consider adding a comment explaining this priority decision, or if paimon_deps should always be preferred when ENABLE_PAIMON_CPP is ON, swap the priority order.

Copilot uses AI. Check for mistakes.
morningman pushed a commit that referenced this pull request Mar 2, 2026
…xternal headers (#60946)

## Summary
Split thirdparty-only changes from #60883 into an independent PR, so
`thirdparty` can merge first.

## Included Files
- `thirdparty/build-thirdparty.sh`
- `thirdparty/download-thirdparty.sh`
- `thirdparty/paimon-cpp-cache.cmake`
- `thirdparty/patches/apache-arrow-17.0.0-paimon.patch`
- `thirdparty/patches/paimon-cpp-buildutils-static-deps.patch`

## Why Split
- Keep this PR focused on `thirdparty` integration only.
- Reduce rebase/conflict risk for the original feature branch.

## Follow-up
1. Merge this PR first.
2. Rebase the original feature branch on latest `master`.
3. Keep non-thirdparty logic in the original PR.
xylaaaaa added 2 commits March 2, 2026 20:27
…owGroupsByPredicate

When ENABLE_PAIMON_CPP is ON, both Doris's own libarrow.a and paimon-cpp's
libarrow.a were linked into doris_be, causing 3698 duplicate global symbols.
This led to SIGSEGV crashes in paimon::parquet::ParquetFileBatchReader::
FilterRowGroupsByPredicate when libarrow_dataset.a resolved arrow core calls
to the wrong copy (compiled with different feature flags).

Both are Arrow 17.0.0 but compiled with different options:
- Doris:  COMPUTE=OFF, DATASET=OFF, ACERO=OFF, FLIGHT=ON
- paimon: COMPUTE=ON,  DATASET=ON,  ACERO=ON,  FLIGHT=OFF

Fix: when paimon_deps Arrow stack is selected, remove Doris's 'arrow' from
COMMON_THIRDPARTY. paimon's libarrow.a is a superset and provides all symbols
needed by Doris's arrow_flight / arrow_flight_sql.
@xylaaaaa xylaaaaa force-pushed the fix/paimoncpp-dedup-arrow-linking branch from 4041b22 to 4a6ddee Compare March 2, 2026 12:29
@xylaaaaa
Copy link
Contributor Author

xylaaaaa commented Mar 2, 2026

run buildall

1 similar comment
@xylaaaaa
Copy link
Contributor Author

xylaaaaa commented Mar 4, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28814 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a6ddee0c5455e597777f18f7ccfa5811a02272f, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17638	4477	4326	4326
q2	q3	10645	812	537	537
q4	4675	372	269	269
q5	7548	1203	1021	1021
q6	177	175	146	146
q7	778	839	667	667
q8	9282	1457	1319	1319
q9	4849	4667	4713	4667
q10	6826	1879	1650	1650
q11	467	255	232	232
q12	711	573	475	475
q13	17772	4226	3426	3426
q14	239	233	214	214
q15	933	785	788	785
q16	768	719	677	677
q17	710	862	422	422
q18	5776	5393	5200	5200
q19	1108	967	622	622
q20	514	499	390	390
q21	4654	1966	1508	1508
q22	347	305	261	261
Total cold run time: 96417 ms
Total hot run time: 28814 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4729	4535	4652	4535
q2	q3	1803	2197	1811	1811
q4	883	1191	799	799
q5	4058	4356	4304	4304
q6	183	171	144	144
q7	1747	1635	1538	1538
q8	2507	2700	2591	2591
q9	7608	7529	7535	7529
q10	2626	2884	2373	2373
q11	495	431	406	406
q12	486	593	451	451
q13	4139	4410	3642	3642
q14	278	339	344	339
q15	843	797	798	797
q16	734	795	737	737
q17	1204	1725	1341	1341
q18	7198	6869	6617	6617
q19	917	886	863	863
q20	2101	2186	2034	2034
q21	3954	3505	3323	3323
q22	450	439	392	392
Total cold run time: 48943 ms
Total hot run time: 46566 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183851 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4a6ddee0c5455e597777f18f7ccfa5811a02272f, data reload: false

query5	4356	619	534	534
query6	337	230	212	212
query7	4228	486	271	271
query8	354	263	233	233
query9	8777	2732	2745	2732
query10	525	396	351	351
query11	16970	16773	16527	16527
query12	191	130	126	126
query13	1264	451	353	353
query14	6091	3212	3005	3005
query14_1	2860	2833	2784	2784
query15	203	193	183	183
query16	978	396	449	396
query17	1105	720	603	603
query18	2444	447	355	355
query19	213	208	191	191
query20	143	130	133	130
query21	231	149	120	120
query22	4935	6223	5529	5529
query23	17576	17189	16880	16880
query23_1	17299	17012	17116	17012
query24	7220	1612	1225	1225
query24_1	1234	1253	1261	1253
query25	567	497	430	430
query26	1244	268	156	156
query27	2778	475	299	299
query28	4484	1881	1884	1881
query29	794	565	451	451
query30	309	248	203	203
query31	863	723	650	650
query32	87	71	68	68
query33	517	356	288	288
query34	927	907	560	560
query35	627	692	587	587
query36	1104	1086	1018	1018
query37	128	95	83	83
query38	2975	2923	2930	2923
query39	879	870	849	849
query39_1	837	847	825	825
query40	227	150	133	133
query41	62	60	57	57
query42	108	104	99	99
query43	370	383	361	361
query44	
query45	198	191	179	179
query46	869	988	609	609
query47	2093	2105	1992	1992
query48	306	315	227	227
query49	624	468	381	381
query50	675	325	215	215
query51	4111	4075	4128	4075
query52	107	108	97	97
query53	288	340	281	281
query54	296	262	258	258
query55	85	82	81	81
query56	299	308	306	306
query57	1369	1350	1275	1275
query58	282	281	272	272
query59	2552	2648	2489	2489
query60	327	330	322	322
query61	153	145	148	145
query62	634	604	549	549
query63	309	279	278	278
query64	4898	1267	1007	1007
query65	
query66	1466	459	358	358
query67	16259	16360	16275	16275
query68	
query69	408	309	281	281
query70	946	1004	849	849
query71	333	307	294	294
query72	2675	2589	2411	2411
query73	538	537	315	315
query74	9966	9969	9746	9746
query75	2896	2758	2443	2443
query76	2298	1027	690	690
query77	363	393	315	315
query78	11137	11346	10627	10627
query79	2764	804	620	620
query80	1716	627	534	534
query81	579	274	246	246
query82	995	147	122	122
query83	326	261	241	241
query84	251	120	96	96
query85	891	456	417	417
query86	423	301	336	301
query87	3125	3102	3035	3035
query88	3541	2644	2632	2632
query89	429	374	341	341
query90	2010	173	167	167
query91	164	153	134	134
query92	83	82	73	73
query93	1150	809	515	515
query94	625	315	278	278
query95	570	386	313	313
query96	628	513	224	224
query97	2491	2503	2382	2382
query98	249	229	231	229
query99	996	967	884	884
Total cold run time: 254848 ms
Total hot run time: 183851 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.56% (19637/37358)
Line Coverage 36.18% (183285/506660)
Region Coverage 32.48% (142206/437860)
Branch Coverage 33.44% (61688/184487)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.12% (20894/36580)
Line Coverage 40.19% (203013/505107)
Region Coverage 36.81% (162677/441997)
Branch Coverage 37.49% (69367/185051)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 5, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 27511 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a6ddee0c5455e597777f18f7ccfa5811a02272f, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17667	4586	4286	4286
q2	q3	10651	788	506	506
q4	4685	366	252	252
q5	7554	1205	1041	1041
q6	180	171	147	147
q7	778	836	658	658
q8	9295	1456	1348	1348
q9	4903	4788	4675	4675
q10	6359	1896	1646	1646
q11	462	251	237	237
q12	753	574	463	463
q13	18064	2932	2171	2171
q14	228	238	213	213
q15	931	796	806	796
q16	775	712	692	692
q17	696	855	424	424
q18	5968	5373	5167	5167
q19	1117	997	604	604
q20	500	490	391	391
q21	4520	2051	1502	1502
q22	357	337	292	292
Total cold run time: 96443 ms
Total hot run time: 27511 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4640	4631	4557	4557
q2	q3	3848	4393	3876	3876
q4	872	1172	767	767
q5	4088	4379	4331	4331
q6	198	181	140	140
q7	1820	1634	1503	1503
q8	2506	2690	2572	2572
q9	7562	7356	7382	7356
q10	3753	3976	3702	3702
q11	512	441	412	412
q12	501	586	462	462
q13	2705	3612	2323	2323
q14	280	298	288	288
q15	845	833	832	832
q16	728	795	727	727
q17	1150	1429	1400	1400
q18	7323	6721	6745	6721
q19	910	892	883	883
q20	2226	2153	1992	1992
q21	3984	3596	3349	3349
q22	470	406	389	389
Total cold run time: 50921 ms
Total hot run time: 48582 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 153140 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4a6ddee0c5455e597777f18f7ccfa5811a02272f, data reload: false

query5	4343	642	535	535
query6	311	220	207	207
query7	4211	476	275	275
query8	330	251	243	243
query9	8726	2744	2726	2726
query10	509	363	339	339
query11	7371	5849	5644	5644
query12	195	132	136	132
query13	1270	464	373	373
query14	5770	3832	3612	3612
query14_1	2823	2811	2819	2811
query15	204	195	174	174
query16	995	449	452	449
query17	1107	725	622	622
query18	2440	462	359	359
query19	216	216	186	186
query20	135	130	128	128
query21	227	148	123	123
query22	4866	4767	4802	4767
query23	15951	15530	15333	15333
query23_1	15476	16176	15827	15827
query24	7366	1705	1290	1290
query24_1	1318	1438	1241	1241
query25	578	481	467	467
query26	1232	277	161	161
query27	2914	506	317	317
query28	4773	2031	1942	1942
query29	865	604	522	522
query30	345	275	231	231
query31	1461	1409	1289	1289
query32	81	78	75	75
query33	508	401	289	289
query34	954	943	595	595
query35	702	691	665	665
query36	1256	1202	982	982
query37	138	96	80	80
query38	2944	2997	2869	2869
query39	860	860	846	846
query39_1	852	820	817	817
query40	234	152	137	137
query41	67	61	56	56
query42	302	297	306	297
query43	242	253	220	220
query44	
query45	197	189	189	189
query46	868	975	604	604
query47	2106	2150	2035	2035
query48	301	314	227	227
query49	642	460	392	392
query50	666	274	220	220
query51	4146	4128	4013	4013
query52	289	291	280	280
query53	287	334	289	289
query54	296	276	258	258
query55	95	92	82	82
query56	318	354	311	311
query57	1360	1329	1276	1276
query58	283	273	279	273
query59	1300	1465	1269	1269
query60	327	340	328	328
query61	147	148	146	146
query62	625	583	547	547
query63	308	280	277	277
query64	5069	1274	982	982
query65	
query66	1481	458	346	346
query67	16279	16254	16292	16254
query68	
query69	398	311	300	300
query70	991	959	901	901
query71	336	305	291	291
query72	2776	2740	2577	2577
query73	557	563	322	322
query74	9962	9899	9771	9771
query75	2874	2774	2493	2493
query76	2295	1034	694	694
query77	357	402	328	328
query78	11143	11233	10624	10624
query79	3006	812	599	599
query80	1786	626	515	515
query81	591	289	236	236
query82	986	153	121	121
query83	347	262	248	248
query84	273	133	105	105
query85	959	518	441	441
query86	477	321	302	302
query87	3123	3081	3022	3022
query88	3559	2643	2645	2643
query89	421	373	342	342
query90	1977	178	180	178
query91	165	159	135	135
query92	78	74	67	67
query93	1447	862	518	518
query94	645	332	282	282
query95	581	334	375	334
query96	640	516	228	228
query97	2470	2528	2399	2399
query98	258	221	229	221
query99	995	991	907	907
Total cold run time: 236090 ms
Total hot run time: 153140 ms

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.25% (23152/36603)
Line Coverage 46.47% (234889/505433)
Region Coverage 43.52% (192445/442237)
Branch Coverage 44.64% (82681/185222)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.24% (23149/36603)
Line Coverage 46.48% (234906/505433)
Region Coverage 43.56% (192645/442237)
Branch Coverage 44.65% (82705/185222)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants